281 research outputs found

    Towards a Theoretical Analysis of PCA for Heteroscedastic Data

    Full text link
    Principal Component Analysis (PCA) is a method for estimating a subspace given noisy samples. It is useful in a variety of problems ranging from dimensionality reduction to anomaly detection and the visualization of high dimensional data. PCA performs well in the presence of moderate noise and even with missing data, but is also sensitive to outliers. PCA is also known to have a phase transition when noise is independent and identically distributed; recovery of the subspace sharply declines at a threshold noise variance. Effective use of PCA requires a rigorous understanding of these behaviors. This paper provides a step towards an analysis of PCA for samples with heteroscedastic noise, that is, samples that have non-uniform noise variances and so are no longer identically distributed. In particular, we provide a simple asymptotic prediction of the recovery of a one-dimensional subspace from noisy heteroscedastic samples. The prediction enables: a) easy and efficient calculation of the asymptotic performance, and b) qualitative reasoning to understand how PCA is impacted by heteroscedasticity (such as outliers).Comment: Presented at 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton

    Optimally Weighted PCA for High-Dimensional Heteroscedastic Data

    Full text link
    Modern applications increasingly involve high-dimensional and heterogeneous data, e.g., datasets formed by combining numerous measurements from myriad sources. Principal Component Analysis (PCA) is a classical method for reducing dimensionality by projecting such data onto a low-dimensional subspace capturing most of their variation, but PCA does not robustly recover underlying subspaces in the presence of heteroscedastic noise. Specifically, PCA suffers from treating all data samples as if they are equally informative. This paper analyzes a weighted variant of PCA that accounts for heteroscedasticity by giving samples with larger noise variance less influence. The analysis provides expressions for the asymptotic recovery of underlying low-dimensional components from samples with heteroscedastic noise in the high-dimensional regime, i.e., for sample dimension on the order of the number of samples. Surprisingly, it turns out that whitening the noise by using inverse noise variance weights is suboptimal. We derive optimal weights, characterize the performance of weighted PCA, and consider the problem of optimally collecting samples under budget constraints.Comment: 52 pages, 13 figure

    Convolutional Analysis Operator Learning: Dependence on Training Data

    Full text link
    Convolutional analysis operator learning (CAOL) enables the unsupervised training of (hierarchical) convolutional sparsifying operators or autoencoders from large datasets. One can use many training images for CAOL, but a precise understanding of the impact of doing so has remained an open question. This paper presents a series of results that lend insight into the impact of dataset size on the filter update in CAOL. The first result is a general deterministic bound on errors in the estimated filters, and is followed by a bound on the expected errors as the number of training samples increases. The second result provides a high probability analogue. The bounds depend on properties of the training data, and we investigate their empirical values with real data. Taken together, these results provide evidence for the potential benefit of using more training data in CAOL.Comment: 5 pages, 2 figure

    Kinematic Analysis of Prey Capture in Coastal Giant Salamanders (Dicamptodon tenebrosus)

    Get PDF
    Salamanders use a variety of techniques to capture prey that involves a combination of lingual and jaw prehension. For example, some plethodontid salamanders often use ballistic tongue projection to capture prey. Salamanders of the family Dicamptodontidae, are the largest sized terrestrial salamanders in the world which feed on a diverse array of prey items (arthropods, annelids, small mammals, and reptiles). Objectives of our study were to describe and quantify the behavior of terrestrial adult coastal giant salamanders (D. tenebrosus). While there has been much research conducted on aquatic phase D. tenebrosus, little is known about their terrestrial counterparts. Feeding bouts of three distinct prey types (e.g., crickets, earthworms, and slugs) were recorded using high-speed video (420-1000 frames/second) recorded with a Casio Exlim EX-ZR100 digital camera. For a feeding trial, salamanders were offered a prey item with forceps. Trials were repeated on separated days with each salamander (N=12) being exposed to equal ratios of prey items. Videos were analyzed for velocity of initial strike, lingual projection, lower and upper jaw prehension, and feeding success. Non-metric multi-dimensional scaling analysis indicated significant differences in feeding patterns among prey types. Lingual prehension was the prominent method of ingestion when a small prey item was offered (crickets) and the use of upper and lower mandible were used in a snapping motion with larger prey items (earthworms). Future work will incorporate different prey items, as well as examine prey preference and foraging behaviors of D. tenebrosus. Additionally some comparative analysis will be conducted using the tiger salamander (Abystoma tigrinum) and the tailed frog (Ascaphus truei) on the mechanics of prey capture in amphibian taxa

    Streaming Probabilistic PCA for Missing Data with Heteroscedastic Noise

    Full text link
    Streaming principal component analysis (PCA) is an integral tool in large-scale machine learning for rapidly estimating low-dimensional subspaces of very high dimensional and high arrival-rate data with missing entries and corrupting noise. However, modern trends increasingly combine data from a variety of sources, meaning they may exhibit heterogeneous quality across samples. Since standard streaming PCA algorithms do not account for non-uniform noise, their subspace estimates can quickly degrade. On the other hand, the recently proposed Heteroscedastic Probabilistic PCA Technique (HePPCAT) addresses this heterogeneity, but it was not designed to handle missing entries and streaming data, nor does it adapt to non-stationary behavior in time series data. This paper proposes the Streaming HeteroscedASTic Algorithm for PCA (SHASTA-PCA) to bridge this divide. SHASTA-PCA employs a stochastic alternating expectation maximization approach that jointly learns the low-rank latent factors and the unknown noise variances from streaming data that may have missing entries and heteroscedastic noise, all while maintaining a low memory and computational footprint. Numerical experiments validate the superior subspace estimation of our method compared to state-of-the-art streaming PCA algorithms in the heteroscedastic setting. Finally, we illustrate SHASTA-PCA applied to highly-heterogeneous real data from astronomy.Comment: 19 pages, 6 figure

    CometChip: A High-throughput 96-Well Platform for Measuring DNA Damage in Microarrayed Human Cells

    Get PDF
    DNA damaging agents can promote aging, disease and cancer and they are ubiquitous in the environment and produced within human cells as normal cellular metabolites. Ironically, at high doses DNA damaging agents are also used to treat cancer. The ability to quantify DNA damage responses is thus critical in the public health, pharmaceutical and clinical domains. Here, we describe a novel platform that exploits microfabrication techniques to pattern cells in a fixed microarray The ‘CometChip’ is based upon the well-established single cell gel electrophoresis assay (a.k.a. the comet assay), which estimates the level of DNA damage by evaluating the extent of DNA migration through a matrix in an electrical field. The type of damage measured by this assay includes abasic sites, crosslinks, and strand breaks. Instead of being randomly dispersed in agarose in the traditional assay, cells are captured into an agarose microwell array by gravity. The platform also expands from the size of a standard microscope slide to a 96-well format, enabling parallel processing. Here we describe the protocols of using the chip to evaluate DNA damage caused by known genotoxic agents and the cellular repair response followed after exposure. Through the integration of biological and engineering principles, this method potentiates robust and sensitive measurements of DNA damage in human cells and provides the necessary throughput for genotoxicity testing, drug development, epidemiological studies and clinical assays.National Institute of Environmental Health Sciences (Training Grant in Environmental Toxicology T32-ES007020)Massachusetts Institute of Technology. Center for Environmental Health Sciences (P30-ES002109)National Institute of Environmental Health Sciences (5-UO1-ES016045)National Institute of Environmental Health Sciences (1-R21-ES019498)National Institute of Environmental Health Sciences (R44-ES021116

    Reconstruction of 3D Whole-Body PET Data Using Blurred Anatomical Labels

    Full text link
    The diagnostic utility of whole-body PET is often limited by the high level of statistical noise in the images. An improvement in image quality can be obtained by incorporating correlated anatomical information during the reconstruction of the PET data. The combined PET/CT (SMART) scanner allows the acquisition of accurately aligned PET and CT whole-body data. The authors present results of incorporating aligned anatomical information from the CT during the reconstruction of 3D whole-body PET data. They use the FORE+PWLS method for the reconstruction and a label model to incorporate anatomical information via penalty weights. Since in practice mismatches between anatomical and functional data are unavoidable, the labels are “blurred” to reflect the uncertainty associated with the anatomical information. Results show the potential advantage of incorporating anatomical information by using a blurred labels with the penalty weights.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/85864/1/Fessler153.pd
    corecore